152 research outputs found
Safe Model-Free Reinforcement Learning using Disturbance-Observer-Based Control Barrier Functions
Safe reinforcement learning (RL) with assured satisfaction of hard state
constraints during training has recently received a lot of attention. Safety
filters, e.g., based on control barrier functions (CBFs), provide a promising
way for safe RL via modifying the unsafe actions of an RL agent on the fly.
Existing safety filter-based approaches typically involve learning of uncertain
dynamics and quantifying the learned model error, which leads to conservative
filters before a large amount of data is collected to learn a good model,
thereby preventing efficient exploration. This paper presents a method for safe
and efficient model-free RL using disturbance observers (DOBs) and control
barrier functions (CBFs). Unlike most existing safe RL methods that deal with
hard state constraints, our method does not involve model learning, and
leverages DOBs to accurately estimate the pointwise value of the uncertainty,
which is then incorporated into a robust CBF condition to generate safe
actions. The DOB-based CBF can be used as a safety filter with any model-free
RL algorithms by minimally modifying the actions of an RL agent whenever
necessary to ensure safety throughout the learning process. Simulation results
on a unicycle and a 2D quadrotor demonstrate that the proposed method
outperforms a state-of-the-art safe RL algorithm using CBFs and Gaussian
processes-based model learning, in terms of safety violation rate, and sample
and computational efficiency
Hallucination Improves the Performance of Unsupervised Visual Representation Learning
Contrastive learning models based on Siamese structure have demonstrated
remarkable performance in self-supervised learning. Such a success of
contrastive learning relies on two conditions, a sufficient number of positive
pairs and adequate variations between them. If the conditions are not met,
these frameworks will lack semantic contrast and be fragile on overfitting. To
address these two issues, we propose Hallucinator that could efficiently
generate additional positive samples for further contrast. The Hallucinator is
differentiable and creates new data in the feature space. Thus, it is optimized
directly with the pre-training task and introduces nearly negligible
computation. Moreover, we reduce the mutual information of hallucinated pairs
and smooth them through non-linear operations. This process helps avoid
over-confident contrastive learning models during the training and achieves
more transformation-invariant feature embeddings. Remarkably, we empirically
prove that the proposed Hallucinator generalizes well to various contrastive
learning models, including MoCoV1&V2, SimCLR and SimSiam. Under the linear
classification protocol, a stable accuracy gain is achieved, ranging from 0.3%
to 3.0% on CIFAR10&100, Tiny ImageNet, STL-10 and ImageNet. The improvement is
also observed in transferring pre-train encoders to the downstream tasks,
including object detection and segmentation.Comment: International Conference on Computer Vision(ICCV), 202
- …